Total variation minimization for stable multidimensional signal 

recovery 



Consider the problem of reconstructing a multidimensional signal from partial information. 
Without any additional assumptions, this problem is ill-posed. However, for signals such as 
natural images or movies, the minimal total variation estimate consistent with the measure- 
ments often produces a good approximation to the underlying signal, even if the number of 
measurements is far smaller than the ambient dimensionality. While reconstruction guarantees 
and optimal measurement designs have been established for related ^i-minimization problems, 
the theory for total variation minimization has remained elusive until recently, when guarantees 
for two-dimensional images x G C w were established. This paper extends the recent theoret- 
ical results to signals x G C N of arbitrary dimension d > 2. To be precise, we show that a 
multidimensional signal x G C N can be reconstructed from 0(sd log(N d )) linear measurements 
y = Ax using total variation minimization to within a factor of the best s-term approximation 
of its gradient. The reconstruction guarantees we provide are necessarily optimal up to poly- 
nomial factors in the spatial dimension d and a logarithmic factor in the signal dimension N . 
The proof relies on bounds in approximation theory concerning the compressibility of wavelet 
expansions of bounded- variation functions. 

1 Introduction 

Compressed sensing (CS) is a new signal processing methodology where signals are acquired in 
compressed form as undersampled linear measurements. The applications of CS are abundant, 
ranging from radar and error correction to many areas of image processing [13]. The underlying 
assumption that makes such acquisition and reconstruction possible is that most natural signals 
are sparse or compressible. We say that a signal x £ C p is s-sparse when 



Compressible signals are those which are well- approximated by sparse signals. More generally, a 
signal x G C p is said to be s-sparse with respect to a basis B when x can be represented as a 
linear combination of s atoms from B. In the CS framework, we acquire mCp nonadaptive linear 
measurements of the form 
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y = M(x) + £, 

where M : C p — > C m is an appropriate linear operator and £ is vector modeling additive noise. The 
theory of CS [H [16] ensures that under suitable assumptions on the measurement operator M., a 
sufficiently compressible signal can be accurately approximated by the signal of minimal £i-norm 
consistent with the measurements, 

x = argmin ||iu||i such that \\A4(w) — y\\2 < £, (Xi) 

w 

1/2 

where ||iu||i = J2i \ w i\ an d \\ w \\2 = \ w i\ 2 ) denote the standard l\ and Euclidean norms, and 
e bounds the noise level ||£||2 < £■ The program (Li) may be cast as a second order cone program 
(SOCP) and can be solved efficiently using standard convex programming methods [10} I14j. 

To guarantee robust recovery of compressible signals via (L\), Candes and Tao introduced in [6] 
the restricted isometry property (RIP) for a measurement operator M. 

Definition 1. A linear operator Ai : C p — > C m is said to have the restricted isometry property 
(RIP) of order s G N and level 5 G (0, 1) if 

(1 - 6)\\x\\l < \\M(x)\\l < (1 + 5)\\x\\l for all s - spars e x G C p . (2) 

Many distributions of random matrices of dimension m xp are known to generate RIP matrices of 
order s and level 5 < c < 1 for m ~ 5~ 2 s(logp) 4 . Representative families of random matrices include 
randomly subsampled rows from the discrete Fourier transform [34] or from a bounded orthonormal 
system more generally [Ml ESI EH [32l 12] , and randomly-generated circulant matrices [2jJ. Moreover, 
a matrix whose entries are independent and identical (i.i.d.) realizations of a properly-normalized 
subgaussian random variable will have the RIP with probability exceeding 1 — e~ cm once m ~ 

5~ 2 siog{ P /s) [3 m\ m in. 

Candes, Romberg, and Tao [5] showed that when the measurement operator M has the RIP of 
order 0{s) and sufficiently small constant 5, the program (L\) recovers an estimation x to x that 
satisfies the error bound 

\\x-x\\ 2 <C ^ X ~J sh (3) 

where x s denotes the best s-sparse approximation to the signal x. Using properties about Gel'fand 
widths of the i\ ball due to Kashin [19] and Garnaev-Gluskin [IT], this is the optimal minimax 
reconstruction rate from m ~ slog(p/s) nonadaptive linear measurements. Due to the rotational- 
invariance of an RIP matrix with randomized column signs [22], a completely analogous theory 
holds for signals that are compressible with respect to a known orthonormal basis or tight frame 
D by replacing w with D*w inside the £i-norm of the minimization problem (L\) [3]. 

1.1 Imaging with CS 



Natural images are highly compressible with respect to their gradient representation. A typical 
grayscale digital image, regarded as a signal x G , consists primarily of slowly-varying pixel 
intensities, with large jumps in intensity occurring only along edges. Figure [U illustrates the gradient 
sparsity of a representative image x G along with its discrete directional derivatives, 

x u : C N * N -+ C^ N -^ N , (x u ) j>k = x j+1)k -x j>k (4) 

X v . C y C ^ , (p^v)j,k — •Ej,k+l "^j,k (5) 
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Figure 1: An image, along with its horizontal and vertical discrete directional derivatives. 



The discrete gradient transform V : C^ 2 — > (£ NxNx2 [ s defined in terms of the directional 
derivatives, 



((x u ) j>k ,(x v ) j)k ), l<j<N-l, l<k<N-l 

(0,(x v ) j>k ), j = N, l<k<N-l 

((x u ) j>k ,0), k = N, l<j<N-l 

(0,0), j = k = N 



The ^i-norm of the discrete gradient defines a seminorm for the space , often referred to as the 
total variation seminorm and denoted by 

II^IItv = ||Vsc||i. (6) 

Due to the gradient sparsity of natural images, it should not be surprising that the total variation 
minimization program 

x = argmin H^Htv such that ||-A/f (2) — y\\2 < e. (TV) 

z 

is often used for image reconstruction, in setting of compressed sensing and more broadly in imaging 
applications such as denoising, deblurring, and inpainting (see e.g. [U El El [30], EJ Ell ESJ [23j [29j [26j 
[T8 | 120 1 [28] and the references therein). While (TV) is similar to the ^i-minimization program (L\), 
the RIP-based theoretical guarantees for (L\) do not directly translate to recovery guarantees for 
(TV) because the gradient map z — > ~Vz is not well-conditioned. In fact, viewed as an invertible 
operator over mean-zero images, the condition number of the gradient map grows linearly with 
the signal side-length N. Recovery guarantees for (TV) were nevertheless obtained in [28] for the 
setting of two-dimensional images x G by showing that the gradient map is well-conditioned 
when restricted to signals lying in null space of a matrix with the restricted isometry property: 

Theorem A (from |28j). There are choices of linear operators Ai : C^ 2 — > C m with m ~ 
s log(iV 2 /s) for which the following holds for any image x £ : Given noisy measurements 
y = A4(x) + £ with noise level ||£||2 < £, the total-variation minimizing signal 

x = argmin ||z||tv such that \\A4(z) — 3/ 1 1 2 < e (7) 

z 

satisfies the error bound 

\\x-A\\ 2 < Clog(N 2 /s) ^ X -^ x)sh + e) , (8) 
where here and throughout, z s denotes the best s-term approximation to the array z. 
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In words, the total-variation minimizer estimates x to within a factor of the noise level and 
best s-term approximation of its gradient. The bound in ([8]) is optimal up to the logarithmic factor 
log(iV 2 /s). 

The contribution of this paper is to extend Theorem to multidimensional signals x G 
of arbitrary dimension d > 2. We show that the signal x G of minimal (ti-dimensional) 

total variation seminorm consistent with m ~ sd\og(N d ) appropriately-chosen linear measurements 
y = M.{x) + £ will approximate x to within a factor of the noise level and the best s-term 
approximation to the (<i-dimensional) discrete gradient of x, modulo a single logarithmic factor in 
the signal dimension N d . In particular, the results of this paper provide guarantees on the power 
of total variation minimization in reconstructing three-dimensional digital movies, which represent 
sequences of gradually-changing images and thus have compressible (three-dimensional) discrete 
gradient. 

Our proof rests on extending the Sobolev inequalities for random subspaces from [28] to higher- 
dimensional signal structures, using bounds of Cohen, Dahmen, Daubechies, and DeVore in [TT] on 
the compressibility of wavelet representations for functions of bounded variation. Unfortunately 
these bounds, and hence our results for total variation, do not hold in dimension d = 1; guarantees 
on the fidelity of total variation minimization in the one-dimensional setting remains an interesting 
open problem. 

1.2 Organization 

The article is organized as follows. In Section [2] we recall relevant background material on the 
multidimensional total variation seminorm and multidimensional orthonormal wavelet transform. 
Section [3] states our main result: total variation minimization provides stable signal recovery for 
signals of arbitrary dimension d > 2. The proof of this result will occupy the remainder of the 
paper; in Section [3] we prove that the signal gradient is recovered stably, while in Section [5] we pass 
from stable gradient recovery to stable signal recovery using the strengthened Sobolev inequalities 
for random subspaces. The proofs of propositions and theorems used along the way are contained 
in the appendix. 

2 Preliminaries for multidimensional signal analysis 

The setting for this article is the space C Nd of multidimensional arrays of complex numbers, con- 
sisting of elements 

x = (x a ) G C Nd , a = (a 1 ,a 2 ,...,a d ) G [N] d , 
where [N] d = {1, 2, . . . , N} d . The space C Nd is a Hilbert space using the standard inner product 

( x ,y) = Yl Xa ■ y°" ( 9 ) 

a£[N] d 
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and is isometric to the subspace Xyv of ^([0, l) d )0 consisting of functions which are constant over 
cubes of side length N^ 1 , 

^ = {/eL 2 ([fl,i) J ),/(«) = / a , Sa^l< Ui< ^\ (10) 

For x = (x a ) G C Nd , the isometry is provided by identifying f a = f£ = N d / 2 x a . More generally, 
we denote by \\x\\ p = ( Yl a ^[N] d \ x a\ p ) the entrywise £ p -norm of the signal x. 

For I = 1,2, ... d, the discrete derivative of x in the direction of rg is the array x r£ € x(JV-i)xiV 
defined component-wise by 

( Xr e)a ~ I (ai,a2 r ..,o,+l r .,a (i ) ~~ x (ai ,012 ,...,a t ,...a d ) 1 (H) 

and we define the ci-dimensional discrete gradient transform V : C^ d — > <C dxNd through its com- 
ponents 

(Vx) e d = f ( ^ "i^N-l, 

y Ji > a \ 0, else v ; 

The d-dimensional total-variation seminorm is the £i-norm of the d-dimensional discrete gradient, 

II^IItv = ||Va;||i. (13) 

A linear operator A : C N<i — > C p can be represented as a sequence of multidimensional arrays 
A = (afe). The linear operation y = A(x) has component- wise action 

y k = [A(x)] k = (a k ,x) (14) 

where the inner product between multidimensional arrays is defined in ([9]). A linear operator 
A : — > C Nd can be expressed similarly through its components y a = [A(x)] a = (& a ,x). 

Finally, we introduce the row direct sum operation for concatenating linear operators: if A : 
C Nd -». C ri and B : C Nd -> C7 2 then M = A r B is the linear operator from C Nd to C ri+r2 with 
component arrays Ai = (mk) V k=i 2 gi ven by 



ojfc, i < k < n, 

bjfc_ ri , 1 + ri < k < r 2 



Alternatively, the column direct sum operation for concatenating linear operators A : C Nd — > C p 
and B : C N — > C p is the linear operator M. = A ® c B from C 2xN to C p with component arrays 
M = {m k f k=l given by 

(a fc ) a , £ = 1, 
(6fe)a, ^=2 



(™>k)e,c 



1 Recall that / £ L-2(Q) if /q |/(m)| 2 c!u < oo, and L2(Q) is a Hilbert space equipped with the inner product 
(f,9) = f Q /(«) ■ g(u)du 
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2.1 The multidimensional Haar wavelet transform 



The Haar wavelet transform provides a sparsifying basis for natural signals such as images and 
movies, and is closely related to the discrete gradient. The Haar transform plays an important role 
in our analysis for passing from the ^-theory of compressed sensing to theory for total variation 
minimization. For a comprehensive introduction to wavelets, we refer the reader to |15j . 

The (continuous) multidimensional Haar wavelet basis is derived from a tensor-product repre- 
sentation of the univariate Haar basis, which forms an orthonormal system for square-integrable 
functions on the unit interval and consists of the constant function 



h°(u) 

the step function 



1 < t < 1, 
0, otherwise, 



h l {u) 



1 0<t<l/2, 
-1 1/2 < * < 1, 

and dyadic dilations and translations of the step function, 

h jfk (t) = 2^ 2 h\2H - k); j G N, < k < 2 j . (15) 

The Haar basis for the higher dimensional space L2{Q) of square-integrable functions on the unit 
cube Q = [0, l) d consists of tensor-products of the univariate Haar wavelets. Concretely, for 
V = {0, l} d — {0} d and e = (ei, e%, . . . , e^) G V, we define the multivariate functions 

h e {u) = Y[h e *( Ui ). 

ei 

The orthonormal Haar system on -^(Q) is then comprised of the constant function along with all 
functions of the form 

hl k (u) = 2 jd / 2 h e (2 j u -k), eeV, j>l, k e Z d n 2 j Q. (16) 

The discrete multidimensional Haar transform is derived from the continuous construction via 
the isometric identification (llOj) between and Sjy C L2(Q): defining 

h jAe (a) = N- d / 2 hl k (a/N), a£[N] d , (17) 

the matrix product computing the discrete Haar transform can be expressed as T~Lx = ((fij^e, x))j k e . 
Note that with this normalization, the transform is orthonormal. 



2.2 Gradient versus wavelet sparsity 

Since the multivariate Haar transform H : C Nd — > C Nd is orthonormal, standard results in com- 
pressed sensing (e.g. [8]) guarantee that by minimizing the £i-norm of the conjugate transpose or 
inverse Haar transform, 

x = argmin ||'H*(z)||i such that — y\\2 < e (^i-Wav) 

z 

a multidimensional signal can be reconstructed from m > Cs log(N d /s) measurements to within a 
factor of its best s-term approximation in the Haar basis. A straightforward calculation verifies that 
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a signal which is s-sparse with respect to the discrete gradient is s log(iV)-sparse with respect to 
the Haar transform. Moreover, this relationship between gradient and wavelet sparsity is stable, as 
can be seen from the following corollary of results from [11] on the decay of wavelet representations 
associated to functions of bounded variation: 

Proposition 2 (Corollary of Theorem 1.1 from [H]). There is a universal constant C > such that 
the following holds for any x € C^ d in dimension d > 2: if the Haar transform coefficients c = H(x) 
are partitioned by their support into blocks cj^ = ((hj^, e , x ))eev of cardinality \cj k\ = 2 d — 1, then 
the coefficient block of kth largest i2-norm, denoted by cn^A, has l2-norm bounded by 



< c- 



x TV 



k ■ 2 d / 2 " 1 ' 

Proposition [21 whose derivation from Theorem 1.1 of [11] is outlined in the appendix, suggests 
that (^i-Wav) should provide reasonably accurate reconstructions for natural signals in the context 
of compressed sensing. 

However, as illustrated in Figure [2] below, the minimizing solutions to (^i-Wav) are known to 
produce high-frequency artifacts; empirically, the minimizers by total variation tend to produce 
better reconstructions. 



(Original) 



(TV) 



-Wav) 




Figure 2: Frame-by-frame comparison between (TV) and (£i-WAV) reconstruction on a 16 x 16 x 16- 
pixel MRI movie from m = 512 i.i.d. Gaussian measurements. 



Theoretical justification for the good empirical performance of total variation minimization is 
clearly desirable, and this will occupy the remainder of the present article. 



3 The main result 

Our main result concerns near-optimal recovery guarantees for multidimensional total variation 
minimization from compressed measurements. Recall that a linear operator A : — > C p is said 
to have the restricted isometry property (RIP) of order s and level 5 € (0, 1) when 

(1 - S)\\x\\l < \\A(x)\\l < (1 + 5)\\x\\ 2 2 for all s-sparse x G C^. (18) 

A linear operator A = (a^) : — > C p satisfies the RIP if and only if the p x N d matrix A whose 
klh. row consists of the unraveled entries of the /cth multidimensional array satisfies the classical 
RIP, ([1]), and so without loss we treat both definitions of the RIP as equivalent. 
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Analogous to the standard theory of compressed sensing, a lower bound on the optimal re- 
covery rate for an estimate x = A(ftA(x)) as a function of m > slog(N d /s) nonadaptive linear 
measurements of x is given by 

Il---ll2<^ ^ + £ J ; (19) 

where (Va;) s is the best s-sparse approximation to the discrete gradient Vx. Indeed, if a better 
reconstruction rate were possible, then because ||V(i; — x)\\2 < 2d||i — x\\2, we would arrive at a 
better reconstruction rate for ||Vi — Vcc||2 than that given by the optimal rate ((3|). Here we have 
used the notation u > v (analogously u < v) to indicate that there exists some absolute constant 
C > such that u > (u < Cv). 

For our main result it will be convenient to define for a multidimensional array a G 1>c ( N ~ 1 ) xN 
the associated arrays ao £ G and a 0f G obtained by concatenating a block of zeros to the 
beginning and end of a oriented in the Ith direction: 

^ = I a 2<a < N ™ 

I a ai ,...,a e -i,...,a d , 2 < < iv 

and 

ia ^ a = \a a , l<a e <N-l (21) 

The following lemma relating gradient measurements with a to signal measurements with a° e and 
ao £ can be verified by direct algebraic manipulation and thus the proof is omitted. 

Lemma 3. Given x G and a G C^ 1 *^" 1 )*^"' , 

(a,x re ) = (a 0f ,x) - (a 0e ,x) , 

where the directional derivative x re is defined in ([3]). 

For a linear operator A = (a^) : 1 x(N—i)xN d 1 _^ ^m. we define fa e operators A° e : — > 
C m and Ao e : C Nd — > C' m as the sequences of arrays (a^))™^ and (ao fefc )^! =1 , respectively. From 
Lemma [22], A(x n ) = A° e (x) - A 0e {x). 

We are now prepared to state our main result which shows that total variation minimization 
yields stable recovery of iV^-dimensional signals from RIP measurements. 

Main Theorem. Let N = 2 n . Fix integers p and q, and let % : C Nd — > C Nd be the orthonormal 
Haar wavelet transform, and let A : C N — > C p be such that the composite operator A%~ 1 : 
C N — > C p has the restricted isometry property of order 2s and level S < 1. Let B\,B 2 , ... ,B d with 
Bj : C Nd '^ N ^ C q be such that B = B ± ® C B 2 ffi c - ■ • ® c B d : C^ -1 ^ 1 ' C dq has the restricted 
isometry property of order 5s and level 5 < 1/3. Set m = 2dq + p, and consider the linear operator 
M : C Nd C m given by 



d- 



M = A e r [Bi] 01 e r [Si] 0i e r • • • e r [B t ] 0e e r [B e ] 0e ® r ---® r [B d ] 0d ® r [B d ] 0d . (22) 

The following holds for all x G C Nd : From noisy measurements y = Ai(x) + £ with noise level 
< £, the solution to the total variation minimization program, 

x = argmin ||o:||tv such that \\Ai(z) — y\\2 < e (23) 

X 

satisfies: 



S 



i. \\V(x-x)\\2< llVx - { J s xUl +Vde, 

ii. \\x — x\\tv ^ II Va; — (Vx) a ||i + vide, 

Hi. \\x-x\\ 2 <log{N d )( 11 Vx - ( v*)sh + ^d £ y 

Remarks. 

1. A number of m > sdlog(N d ) i.i.d. and properly normalized Gaussian measurements can be 
used to construct the measurement operator Ai which, with high probability, satisfies the required 
RIP conditions of the theorem [HE]. From this number m of measurements, the error guarantees 
i and ii are optimal up to the factor of y/d on the noise level ,and the error guarantee Hi is optimal 
up to logarithmic factors in the signal dimension N d . 

2. When d = 2, the main theorem recovers the total variation stability guarantee of [2H] up to 
a log(l/s) term. 

3. The requirement of sidelength N = 2 n is not an actual restriction, as signals with arbitrary 
side-length iV can be extended via reflections across each dimension to a signal of side-length 
N = 2 n without increasing the total variation by more than a factor of 2 d . This requirement again 
seems to be only an artifact of the proof and one need not perform such changes in practice. 



We now turn to the proof of the main theorem. We follow the method of proof introduced 
in [28], proving stable gradient recovery and then translating these guarantees to stable signal 
recovery via Sobolev inequalities for incoherent subspaces. 



4 Stable gradient recovery 

In this section we prove statements (i) and (ii) of the main theorem concerning stable gradient 
recovery, using standard results in the l\ theory of compressed sensing combined with a summation 
by parts "trick" provided by Lemma [3j 

Recall that when a signal obeys a tube and cone constraint we can bound the norm of the entire 
signal, as in [8]. We refer the reader to Section A.l of [28] for a complete proof. 

Proposition 4. Suppose that A is a linear operator satisfying the restricted isometry property of 
order 5s and level 5 < 1 /3, and suppose that the signal v satisfies a tube constraint 

\\AMh<e. 

Suppose further that for a subset S of cardinality \S\ = s (with complement S c ), v satisfies a 
cone- constraint 

||v<H|l < ||vs||i + £. (24) 

Then 

l|v|| 2 <4= + e ( 25 ) 

and 

l|v||i<£ + V^- (26) 
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Using Proposition H] and the RIP assumptions on the operator B, stable gradient recovery (i) 
and (ii) reduce to proving that the discrete gradient of the residual signal error satisfies the tube 
and cone constraints. 

Proof. (Main Theorem, statements (i) and (ii).) Let v = x — x be the residual error. Let B = (&&)• 
Then we have 

Cone Constraint. Let S denote the support of the best s-sparse approximation to Va; and let 
S c be the complement of S. Since x = x — v is the minimizer of (TV) and x satisfies the 
feasibility constraints in (TV), ||Vjc||i < ||Va:||i and by the reverse triangle inequality, 

\\(Vx) s \\ l - IKVvJsHx - ||(VaO<H|i + ll(Vv) 5 c||i 

< ||(Vac)s + (Vv) s ||i + ||(Vsb) S c + (Vv) 5 c|| 1 

= ll v ^lli 

< ||Vx||i 

= ||(Vx) s ||i + ||(Va;)5=||i 

This yields the cone constraint 

||(Vv)<Hli < ||(Vv) s ||i + 2||(Vx) S c||i. 

Tube constraint. Recall that v = x — x. Since both x and x are feasible solutions to (TV), the 
triangle inequality gives 

ll-M(v)||! < \\M(x)-y\\l + \\M(x)-y\\l 
< Ae 2 

By Lemma El we have for each component operator Bj, 

^K) = [B;]°'(v) ^ ,,iv) (27) 
Then S(V«) = Y2j=i Bj( v rj), (where we assume that Vi> is ordered appropriately) and 

d 

\\B(Vv)\\l = W^Bj^g 



d 



< 



d^ll^KOIIl 



d 

< 2^(||S i ]°^v)||! + ||^]o,(v)||i) 

i=i 

< 2d\\M(v)f 2 

< 2de 2 . (28) 



In light of Proposition [5] this completes the proof. 



□ 
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Remark 5. The component operator A from the main theorem was not used at all in deriving 
properties (i) and (ii); on the other hand, only the measurements in A will be used to derive 
property (in) from (i) and (ii). 

5 A Sobolev inequality for incoherent subspaces 

The purpose of this section is to derive the following bound for signals lying near the null space of 
an incoherent matrix. 

Theorem 6 (Sobolev inequality for incoherent subspaces). Let d > 2 and let N = 2 n . Let 

% : C Nd —7- be the multivariate Haar wavelet transform. Let B : C Nd — > C m be a linear 

map with the property that BH -1 : C Nd — > C m satisfies the restricted isometry property of order 
2s and level 5 < 1. Then there is a universal constant C > such that for any signal x £ 
satisfying the tube constraint ||23(a:)||2 < £, 

||*|| 2 <c(M^)log(iV d ) + e . (29) 

Note that Theorem [6] admits various corollaries for various families of random matrices with 
restricted isometries. For subgaussian random matrices, we have using the results in [T] 

Corollary 7. Let B : C Nd — > C m be a linear map realizable as an N d x m matrix whose entries 
are mean-zero i.i.d. Gaussian random variables. Then with probability exceeding 1 — e~ cm , the 
following bound holds for any x £ lying in the null-space of B: 

||x|| 2 <(^)[log(iV d )] 2 . (30) 

The proof of Theorem [6] goes as follows: as revealed through the proof of Proposition any 
signal in the null space of an RIP matrix must be relatively flat, in that the norm of its best s-term 
approximation is bounded by the norm of the remaining terms. At the same time, Proposition [2] 
implies that the sequence of wavelet block-coefficient norms ||c(fc)||2 in the null space of BTL.^ 1 must 
be sufficiently compressible and bounded by the total- variation of B. Combining these properties 
- flatness and tail-compressibility — produces the Sobolev- type inequality ([6]). 

Proof of Theorem® Let c = T~L(v) £ C Nd represent the Haar transform of the signal error v = 
x — x. Suppose without loss that the desired sparsity level s is either smaller than 2 d — 1 or a 
positive multiple of 2 d — 1, and write s = p(2 d — 1) where either p £ N or p £ (0, 1) (for arbitrary 
s £ N, we could consider s' = \s/(2 d — 1)] which satisfies s' < 2s). 

Let S = Sq C [N] d be the set of s largest-magnitude entries of c, let S± be the set of s largest- 
magnitude entries of c in [N] d \ Sq, and so on. Note that c$ and similar expressions below can 
have both the meaning of restricting c to the indices in S as well as being the array whose entries 
are set to zero outside S. 

By definition, ||cs ||i is at least as large as ||cn||i for any other Q C [N] d of cardinality s. 
Consequently, llcgc]^ is smaller than Hc^cHx for any other f2 C [N] d of cardinality s. Now, recall 
the alternative decomposition of c from Proposition [2] into blocks of cardinality 2 d — 1, grouped 
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according to the support of the corresponding wavelets. Because s = p(2 d — 1) for either p G N or 

P e (0,1), 



C SS 1 





fc>l 



Y \\ c s k 



^ Y H c (i)lli 
j>p+i 

< ^ || C(i) || 2 



i>p+i 

A"' 



^5 II v I|tv 7 ky Proposition [2] 



< ||^||rylog(iV d ), (31) 

where the last inequality follows from properties of the geometric summation. 
We use a similar procedure to bound the ^2-norrn of the residual, 

l|cs g ||| < Y \\ C U)W 2 2 

llvll 2 Nd 1 
< \\v\\tv \ ^ ± 



2 d ^ £ 2 



< (llvllw) 2 



max (l,p) 

2 



(32) 

S 

Then, ||c S g|| 2 < ||v||tv/\/s. 

By assumption, v satisfies the tube constraint ||£>(v)||2 < e and B'Hr 1 satisfies the restricted 
isometry property. We conclude that 

e > Wv)\\ 2 = \\BH- 1 {c)h 

r 

> \\Bn~\c So + c Sl )h - Y WWHcsJh 

k=2 

r 

> (1 - 5)\\c So + c Sl \\ 2 - (1 + S) lies, lb 

k=2 

> (l-5)|| CSo || 2 -(l + 5)^=11^11!, (33) 

the last inequality holding because the magnitude of each entry in the array cs h is smaller than the 
average magnitude of the entries in the array cs k _ 1 - Along with the tail bound (|3ip . we can then 
conclude that, up to a constant in the restricted isometry level 5, 

Wcsoh < e + \og(N d )( 1 ^-). (34) 
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Combining this bound with the ^2-tail bound (|32p and recalling that the Haar transform H : N d — > 
N d is an isometry, 

||v|| 2 = llW-^Ha = ||c|| 2 < ||c 5o ||2 + \\c Ss h < £ + log(AT d )(M|^), (35) 
which completes the proof. 



□ 



5.1 Proof of the Main Theorem 



Because we proved the bounds (i) and (ii) from the main theorem concerning stable gradient 
recovery in Section [H it remains only to prove the signal recovery error bound (iii). 

By feasibility of both x and x for the constraints in the total variation minimization program, 
the signal error v = x — x obeys the tube-constraint ||^4(v)||2 < 2e. Applying Theorem [6] and the 
total variation bound (ii) yields 

\\X - X\\ 2 = 1 1 V 1 1 2 

< e + log(iV d ) 



< e + log(JV 



\Vx - (Vx) s ||i + ^/s\fde 
\/sd 



<lo 8 (iV)( » Va! -j_ V3! »'»' + v5 t 

The proof completes. 

A Derivation of Proposition [2] 

Recall that the space L P (Q) (1 < p < oo) for ft C M. d consists of all functions / satisfying 

l/p 



L P V) = ( \f(u)\ p du) P <oo. 



The space BV(Q) of functions of bounded variation over the unit cube Q = [0, l) d is often used 
as a continuous model for natural images. Recall that a function / G L\(Q) has finite bounded 
variation if and only if its distributional gradient is a finite measure, and this measure generates 
the BV seminorm More precisely, 

Definition 8. For a vector v £ M. d , we define the difference operator A v in the direction of v by 

A v (f,x) := f(x + v)- f(x). 

We say that a function f £ L\(Q) is in BV(Q) if and only if 

d d 
VqU) = suvh- l ^\\A he] {f,-)\\ Ll{Q{he])) = lim [h' 1 ^ IIA^CMIIl^q^)) < oo 

where ej denotes the jth coordinate vector. The function Vq(/) provides a semi-norm for BV (Q) : 

|/|w(Q) = VQ(f). 
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In particular, piecewise constant functions are in the space BV(Q). For N = 2 n a power of two, 
recall the following relationship between the total-variation of a multidimensional signal x G C Nd 
and the bounded variation of its isometric piecewise-constant representation / G Y<n C -^(Q) as 
defined in (TIU1): 



Lemma 9. Lei N = 2 n for some integral n. Let x G and let f £ Stv &e its isometric embedding 
as in CED. T/ien < iY- d / 2+1 ||a;|| T y. 



Proof. For h < jj 



N ■ 



Afce fc (/, («)) 



where 



0, 



else. 



h < m < 



N " 



(fe) _ / 4 ^ 



Thus 



k=l 
d 

fc=l 

< N- d / 2+1 \ 



N 

X\\TV 



□ 



Cohen, Dahmen, Daubechies, and DeVore showed in [TT] that the properly normalized sequence 
of rearranged wavelet coefficients associated to a function / G L2{£1) of bounded variation is 
in weak-ii, and its weak-£i seminorm is bounded by the function BV seminorm. Using different 
normalizations to those used in [11] — we use the L2-normalization for the Haar wavelets as opposed 
to the Li-normalization — we consider the Haar wavelet coefficients ff = {f, hf) and consider the 



wavelet coefficient block // = (ff) e eE £ 



associated to those Haar wavelets supported on the 



dyadic cube /. With this notation, Theorem 1.1 of [11] applied to the Haar wavelet system over 
L2(Q) reads 

Proposition 10. Let d > 2. Then there exists a constant C > such that the following holds for 
all f G BV(Q). Let the wavelet coefficient block with kth largest l^-norm be denoted by f^ k \ and 
suppose that this block is associated to the dyadic cube L-^ with side-length 2~ 3 . Then 



11/ 



(*)| 



< C 



2 j(d-2)/2|y| 
k 



BY 



Proposition [2] results by translating Proposition [TU] to the discrete setting of C^ d using the 
isometry (|10p and Lemma [H We note that a stronger version of this result was provided for 
the 2-dimensional Haar wavelet basis in [12] and used in the proof of stable image recovery from 
total- variation minimization in 1281. 
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